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We consider the nonlinear differential equation ^ = L(u) + f(t;. 

n 

Use of Galerkin FEM with u(x, t) ~ v(x, t) = E Y.(t)N.(x), where the 

j = I J J ~ 

N.(x) are specified basis functions, results in the implicit system of 

J ~ n . • 

ordinary differential equations, (*) E A., y. - B.(y, ,..., y ,f) = 0, 

j = j * J J ' * ' ' 



i =1, ..., n, where A . = <N., N.>, B. = <N., l(v) + f>. 

"1 J J * 



The method chosen for solution of stiff systems (*) is a version 
of Gear's method which solves the system in its implicit form. This 
leads to the necessity of being able to solve (repeatedly) linear 
algebraic equations whose coefficient matrix has the same sparse and 
banded nature as (A..). 

' J 

Storage requirements for various orders of polynomial triangular 
elements under compact storage mode, profile storage mode, and banded 
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symmetric storage mode are given and compared. For large systems (*), 
compact storage mode leads to significantly reduced requirements. 
Consideration of the linear algebraic systems which arise in Gear's 
method reveals that iteration should be computationally efficient. A 
comparison between various solution methods is given for a nonlinear 
reactor dynamics problem. Associated with each solution method is a 
different storage mode. 

1 . Discription of the Problem 

We consider a nonlinear p.d.e. of the parabolic type, 

l^jr = L(u) + f x e D, t e x (1) 

with appropriate initial and boundary conditions, and L denotes spatial 
operators. In accordance with a weighted residual FEM formulation, an 
approximate solution v(x,t) in the form 

n 

u(x, t) ='v(x,t) = Z YjU)Nj(x) U) 

j = 1 

is assumed. In Eq. (2), N.(x) are a set of specified interpolation 

w 

functions with local support, and the y-(t) are the solution coefficients 

J 

to be determined. Setting the residual function 

R(x, t) = ^ - L( v ) - f (3) 

orthogonal to each of the weighting functions W. (x), i =1, ..., n, 
i .e. 

<R, W.> = u, i = 1 , . . . , n (4) 
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yields the system of nonlinear o.d.e.. 




n 



( 5 ) 



with initial conditions, where 




( 6 ) 



B, = <L(v) + f, W,> 

I X 1 



Out objective is to select a method of solution of Eq. (5) which is 
efficient with respect to memory core requirement, and computational 
effort. With regard to core requirement, an efficient strategy should 
take into account the nature of the (A..) matrix. If the weighting 
functions have local • support, A will be sparse and banded. If a Galerkin 
formulation is employed, W. = N . , and the (A^j)matrix is symmetric. In 
general the sparseness of (A..) increases with finer mesh discretization, 

* J 

as well as space dimensionality. Bandwidth and sparseness increase with 
higher order polynomial interpolation elements, but fewer are required to 
provide an accuracy achieved by lower order elements. The question of 
which is more efficient, higher or lower order elements, is not addressed 
here. 

Attention is given here to a stiff system arising from a FEM formu- 
lation of a two dimensional nonlinear nuclear reactor dynamics problem 
[1 ] . Here l(u) is given by 



with appropriate initial and boundary conditions. In this case, Eq. 



2 2 
L(u) = -au + bu + cA u 



(7) 



2 

(5) becomes, after an integration by parts on the A term 







( 8 ) 
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where 






A. . 


= <N . , N .> 












i J 










C ijk 


= <N r N jlV 

- 3N . 3N . 
< 3~‘ 3x J> 


3N . 

+ < , 

ay * 


3Nj 


► 


(9) 




3 y > 







i 



2. Solution Techniques for the Implicit System of Q.D.E.'s 

Consideration of various schemes for the solution of the implicit 
system of ordinary differential equations given by Eq (5) reveals that 
no matter what type of scheme is employed, it will involve the solu- 
tion of a system of algebraic equations, possibly nonlinear if (3.) is 
nonlinear. Use of even a simple scheme for explicit systems of dif- 
ferential equations, e.g., Euler's method, requires repeated solution 

of a system of linear equations with coefficient matrix (A..) for the 

^ J 

Yj, given y, , ..., y n , and t. If a predictor-corrector method (or 
any method involving derivatives at the new time, generally called an 
implicit numerical method) for explicit systems of differential equa- 
tions is used, a second system of algebraic equations arises for 'the 
dependent variables at the new time. Because of this second system of 
algebraic equations it is best to avoid having to solve (5) for 
derivatives by employing an ordinary diffential equation solver 
designed for implicit systems of equations. 

Given that an implicit method will be employed to solve the system 
(5) there are three levels of matrix storage that are required: (1) 

That required by the system matrices (A..) and (B.) (We are not 

T J 1 

being specific here about the form of (B..); it may involve several con- 
stant matrices or may be a function of time); (2) That required by the 
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differential equation solver; and (3) That required to represent the 
algebraic system of equations for y^ , .... y at the next time in the 
form required by the algebraic equation solver being used. The 
hierarchy of storage levels is shown schematically in Figure 1, along 
with possible options for differential equation solvers and algebraic 
equation solvers, with the preferred storage mode shown in parenthesis. 

Level 1: S ystem Matrices : (A..), (Compact) 

1 sJ 

(3 • ) » (Compact or ?) 
y 1 

Level 2: ODE Solver : e.g. Implicit Gear (Full, 

Crank-Nicol son nx?) 
v others 




i (See Linear) (Compact) 

l ; 

Figure 1 

It is difficult to show all the possible options and Figure 1 is not 
meant to exclude any, but rather to emphasize several points. (1) The 
system matrices are used only in evaluation of the left side of (5) and 
can be stored in any form, some form of compact storage being efficient. 
(2) The differential equation solver will have its own requirements for 
storing the solution values, past history, and auxiliary storage. 
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(3) The choice of solution method for the algebraic equations will 
determine the type of storage required at level 3. In some instances 
the latter choice may be determined by the differential equation solver, 
and could require no additional storage in some cases, but a more usual 
situation will be where at least one matrix must be stored. 

Because the problems in which we are interested are typically stiff 

we were led to Gear's method, which performs well. This method was 

/ 

used in a form designed for implicit systems of differential equations 

[2], and is based on [3]. Gear's method is a variable order, variable 

stepsize, predictor-corrector scheme. The derivatives at the new time 

are approximated by a backwards difference formula, and the resulting 

corrector equation is solved by a quasi -Newton's method. This leads 

to repeated solution of equations of the form J6y = p, where the 

solution <5y represents incremental corrections to the solution values. 

s 3B • 

For Eq. (5), J = )., where h is the current setpsize and s 

3 

is a constant dependent on the current order formula being applied. 

Our version is designed to facilitate easy incorporation of what- 
ever solution scheme and associated storage scheme is suitable for 
these linear systems. Since the user must supply a subprogram to 
evaluate the matrix J, it is then relatively simple for the user 
to store the matrix in a form compatible with the equation solver 
being used. 

In our scheme, the amount of storage required at level 2 is 
approximately 20 n words. Level 1 storage is dependent on the 
problem, and level 3 storage on the linear equation solver incor- 
porated into the method. The details for a specific problem are 
discussed in Section 4. 
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3. Storage Schemes 



The most common method of storing matrices in FEfi, is the banded 
storage scheme, whereby the bandwidth (or half bandwidth in the case 
of symmetric matrices) terms are stored. Some reduction in storage is 
obtained by profile (or skyline) storage. In this scheme, some of the 
zero terms within the band are eliminated. Band and profile storage 
are schematically shown in Figure 2. 





Figure 2 

For large systems, the storage allocated to zero terms by either the 
band or profile scheme comprises a large fraction of the total storage. 
Thus, a compact storage scheme, which stores only the non-zero coeffi- 
cients of a matrix provides a substantial reduction in core require- 
ments for large systems. 

The implementation of compact storage requires two integer array 

vectors, say ISTART and NAME, and a vector of the non zero coefficients, 
t h 

say AA. The i tn integer entry in ISTART is the number q^ , where 

i - 1 

q,- = Z Pi + 1 (10) 

j = 1 J 

and P. is the number of terms in the j th equation (i.e. the number of 

j. 

nodes connected to the j node). In n is the number of unknowns in 
the system, the length of ISTART is (n +1). ISTART then, is a pointer 
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vector whose j th term locates the initial position in the AA vector of 

£ h 

the contributing coefficients to the j equation. The M x 1 NAME 
vector, where 

"V / 1 ("> 

is composed of n successive vector blocks of variable length P^.. The 

P. integers in the block of NAME identify the contributors to the 
J * / 

j n equation. The M x 1 vector AA, contains the real non-zero coeffi- 
cients of the n x n A matrix, arranged in the same contiguous block 
arrangement as the NAME vector. 

A comparison of the core requirements of a symmetric banded matrix 
using banded, profile and compact storage follows. To fix ideas, we 
consider a simple rectangular domain with R rows of elements, and S 
columns of elements. The class of triangular elements with polynomial 
interpolation are considered. The formulas presented are for the case 
where interior nodes are condensed out. The following notation is used, 
n - the number of unknowns 
n s - the half bandwidth for a symmetric matrix 
t - the order of polynomial interpolation 
R - the number of rows of elements in the rectangular grid 

S - the number of columns of elements in the rectangular grid 

N s - symmetric band storage 
Np - profile storage 
N c - compact storage 
a - bytes per word for real numbers 

B - bytes per word for integer numbers 

To obtain minimum bandwidth, numbering of nodes is sequential in the 
vertical direction if R < S, and vice versa if S > R, for profile and 
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banded storage. The numbering sequence for compact storage is irrele- 
vant. For an R x S rectangular grid the number of unknowns is 

n = RS(3t - 2) + (R + S)(2 - 2t) + (t - 1) (12) 

For very large systems, i.e. RS >> (R + S), 

n : RS(3t - 2) (13) 

The core requirement for each of the storage schemes, for R < S, is 

a) banded storage 

N$ = ann s (14) 

where, 

n $ = 3Rt - 2R - t + 3 (15) 

For very large systems 

N $ : aR 2 S for t = 1 (16) 

N : 16aR 2 S for t * 2 

s 

N $ : 49aR 2 S for t = 3 

b) profile storage 

N p = N s - aQ (17) 

where 

g = (R - mLl-11 t 2 + (R - 2) (R - l)(t - l)t (S - 1) 

2 (18) 

(R - 2) (t-1 )t( S - 1) + [2R(t - 1) + 1 ][2R( t - 1) + 2 ] (: _ 2) 
+ 2 2 



For very large systems 



N p : aR 2 S 


t = 1 




Np : 1 2aR 2 S 


t = 2 


(19) 


N p : 35aR 2 S 


t = 3 
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c) compact storage 



N = aM + 8(M + n + 1 ) 



( 20 ) 



where 



M = RS(1 5t 2 - 6t - 2) + (R + S)(-14t 2 + 8t + 2) 



+ (13t - lOt - 1) 



For very large systems 



( 21 ) 



N c : RS(7a + 88) 

N c : RS(46a + 506) 



i c Z RS(115a + 1 223) 



t = 1 

t = 2 
t = 3 



( 22 ) 



d) Comparison of N s , Np and N c for large systems. 

It is noted that banded and profile storage are proportional 

2 

to R S, while compact storage is proportional to RS. The following 
formulas compare the relative core requirements for banded, profile 
and compact storage schemes. 

i) Savings of profile compared to banded storage 





' 0.0 


t = 


1 




N S - N p : 1 


0.25 


t = 


2 


(23) 


N s 


0.29 


t = 


3 





ii) Savings of compact compared to profile 

r 

t-1 



1 l §8 

R ' ctR 



N N ~ , 
P ~ c - J 

N 

P 



i 23 25e 

1 ' 6R ' 



6aR 

1228 



23 

1 ' 7R ' 35aR 



t = 1 
t = 3 



(24) 
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To fix ideas, say 3 = £a, then for large systems 



N P " N s 



11 

R 



\ 1 " 1 It 



21 

12R 



352 

70R 



t = 1 
t = 2 

t = 3 



( 25 ) 



It should be noted from Eq. (24), that banded and profile storage is 

less than compact storage for small systems. For example, in the case 

7 8S 

of t = 1 , banded and profile storage is more efficient when 1 < (^- + ^) 

i.e., when R < 11 in the case t = 1 and 3 = a/2. 

4. Numerical Results for an Example Problem 

We consider the example given by Eq. (7), resulting in the ordinary 
differential equations (3). The domain was a rectangle which was 
discretized with 11 rows and 12 columns, giving 132 nodes and 220 ele- 
ments. There were 22 boundary nodes (fixed values). Using linear 
triangular elements, a system of 110 differential equations was obtained 
The three dimensional array (C.^) required special consideration for 
its storage. We noted that 



n n 
Z Z C, 



n n 

;Y . = z : D,,, 



A * jii w uw 



where 



°i jk 



C ijk 



C . ., + C., . 

ijk lkj 



j = k 



j < k 



(26) 



Because of the regular rectangular grid employed here, each 
equation contains no more than 7 terns. To facilitate handling of the 
nonlinear term, seven entries were allotted to each block of the NAME 
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array (i.e. P. = 7 for all j). For equations with q contributing terms, 
where q < 7, there were (7 - q) null entries in the NAME array. For the 
nonlinear term given by Eq. (26), the number of non zero coefficients in 
any equation is no more than 28, the number of combinations of seven 
nodes taken two at a time (i.e. C. + C. ., j < k) plus the seven 

1 J K 1 K J 

diagonal terms C.... Thus, the nonlinear term requires 28n words. 

I yj yj 

Each of the (A..) and (B-.) matrices requires 7n words. Total level 1 

1 J I J X 

storage required is 42na bytes plus 7n3 bytes for the NAME array; the 
ISTART array is not required for this modified compact storage scheme. 

rS, 



The J matrix that arises; in this problem is (rA. . - bA. . + CB. . + 

** 1 \J 1 J • J 



n 

2a L 



L The matrix can be stored in compact form using the same 



NAME array as for (A. .). 

Three different linear equation solvers were considered, along with 
their associated storage schemes for J. The first was the IMSL pair 
LUDAPB/LUELPB for matrices in symmetric banded storage form. The half 
bandwidth for our sample problem was 12, thus in this case 12n = 1320 
words were required. No additional working storage is required by 
LUDAPB since it performs an in-place decomposition of J. In the 
general case, storage requirements for the J matrix are those for a 
symmetric banded matrix as given by Eqs. (14) and (15). 

The second equation solver used was an iterative method, SOR, for 
which compact storage was used. This required 7n = 770 words. In 
general., storage requirements for the J matrix in compact form are 
given by Eq's (20 and (21). SOR, of course, does not require any 
additional working storage. 

The third equation solver considered was the symmetric form of 
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the Yale Sparse Matrix Package [4], This required J to be stored in a 
symmetric form of the compact storage described in Section 3, and for 
our case required 399 words to store J, plus 510 words to store the 
NAME and ISTART arrays. In addition, approximately 1500 words were 
required to store the decomposition of J, along with the NAME and 
ISTART arrays. In general, storage for the Yale Sparse Matrix Package 
should be much less than that for the profile scheme given by 
Eqs (17) and (18), since a reordering of rows/columns to minimize 
fill-in during the matrix factorization is done. 

The particular problem we have used as an example was designed to 
illustrate the feasibility of using the three different storage/solution 
schemes, and the computational times and storage here are not likely 
to be representative of what might happen in larger problems. In 
particular, the relatively small bandwidth favors the symmetric band 
storage mode in computational effort. 

The SOR method must converge very rapidly to be competitive in 
computational effort, since about 7n operations are required per 
iteration, whereas about 2n s n (after factorization of J) are required 
for solution with symmetric banded matrices of half bandwidth n . 
Somewhat fewer operations are required for the Yale Sparse Matrix 
Package. For our case, SOR requires more computational effort than 
direct methods when the number of iterations for convergence exceeds 
4 (i.e. when 7nNj > 2n s n, where n s is 12 and Nj is the number of 
iterations), although this is offset by the need to factor J each time 
it is recomputed. Since the solution <5y of the system Jay = p 
method increments for the corrector equation, the accuracy requirements 
are low, and SOR requires few iterations for convergence. The results 
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of our example should be observed with the above considerations in mind. 
All times were obtained on the IBM 360 model 67, using the Fortran H 
compiler. 



rms accuracy 
required in 
Gears Method 


LU DAPB/UJ ELP3 




YALE 


SOR 


.1 


30.0 




34.8 


26.2 


.01 


48.7 




47.1 


50.1 


.001 


66.6 




71 .0 


57.0 



Table 1 

For systems with large bandwidths, we expect the computational 
effort required for both the Yale Sparse Matrix Package and SOR to be 
superior to the symmetric banded scheme. 
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